DNA Research 20, 593-603, (2013) 
Advance Access publication on 31 July 201 3 



doi:1 0.1 093/dnares/dst033 



Genome-Wide Association Studies Using Single Nucleotide 
Polymorphism Markers Developed by Re-Sequencing 
of the Genomes of Cultivated Tomato 

Kenta Shirasawa 1 '*, Hiroyuki Fukuoka 2 , Hiroshi Matsunaga 2 , Yuhko Kobayashi 3 , Issei Kobayashi 3 , 
Hideki Hirakawa 1 , Sachiko Isobe 1 , and Satoshi Tabata 1 

Kazusa DNA Research Institute, 2-6-7 Kazusa-Kam atari, Kisarazu, Chiba 292-08 1 8, Japan 1 ; NARO Institute 
of Vegetable and Tea Sciences, 360 Kusawa, Ano, Tsu, Mie 514-23 92, Japan 2 and Life Science Research Center, 
Mie University, 1 577 Kurimamachiya, Tsu, Mie 5 1 4-8507, Japan 3 

*To whom correspondence should be addressed. Tel. +81 438-52-3935. Fax. +81 438-52-3934. 
E-mail: shirasaw@kazusa.or.jp 

Edited by Dr Katsumi Isono 

(Received 4 June 201 3; accepted 5 July 201 3) 

Abstract 

With the aim of understanding relationship between genetic and phenotypic variations in cultivated 
tomato, single nucleotide polymorphism (SNP) markers covering the whole genome of cultivated tomato 
were developed and genome-wide association studies (GWAS) were performed. The whole genomes of six 
tomato lines were sequenced with the ABI-5 500x1 SOLiD sequencer. Sequence reads covering ~ 1 3.7 x of 
the genome for each line were obtained, and mapped onto tomato reference genomes (SL2.40) to detect 
~1 .5 million SNP candidates. Of the identified SNPs, 1.5% were considered to confer gene functions. In 
the subsequent lllumina GoIdenGate assay for 1 536 SNPs, 1 293 SNPs were successfully genotyped, and 
1248 showed polymorphisms among 663 tomato accessions. The whole-genome linkage disequilibrium 
(LD) analysis detected highly biased LD decays between euchromatic (58 kb) and heterochromatic 
regions (1 3.8 Mb). Subsequent GWAS identified SNPs that were significantly associated with agronomical 
traits, with SNP loci located near genes that were previously reported as candidates for these traits. This 
study demonstrates that attractive loci can be identified by performing GWAS with a large number of SNPs 
obtained from re-sequencing analysis. 

Key words: genome-wide association studies; linkage disequilibrium; whole-genome re-sequencing; single 
nucleotide polymorphism; tomato 



1. Introduction 

Tomato (Solanum lycopersicum), which is considered 
to be an important crop, originated from South and 
Central America, and spread to the rest of the world 
with accompanying morphological diversification. 1 
The Solanaceae family, to which tomato belongs, 
includes other important crop species, such as potato 
(S. tuberosum), eggplant (S. melongena), tobacco 
(Nicotiana tabacum), and pepper (Capsicum annuum). 
Comparative genomics within these various genera 



and species have greatly accelerated understanding 
of their genome evolution and the genetic mechan- 
isms that confer phenotypic diversity to these 
species. 2 Furthermore, several interspecific genetic 
linkage maps have been constructed between culti- 
vated tomato and its wild relatives (S. chmielewskii, 
S. habrochaites, S. pennellii, and S. pimpinelUfolium) 3 
These maps allow identification of the genes respon- 
sible for interspecific phenotypic variations, including 
disease resistance, fruit size and shape, and plant 
architecture. 3 However, few genetic studies have 



©The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. 

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/ 
3.0/), which permits non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Forcommercial 
re-use, please contactjournals.permissions@oup.com 



594 

reported intraspecific variations due to its narrow 
genetic diversity. 3,4 

In the field of human and animal genomic and 
genetic studies, the availability of whole-genome se- 
quence data has resulted in more rapid advances in 
re-sequencing analysis and genome-wide association 
studies (GWAS) than in classical genetics and quantita- 
tive trait locus (QTL) mapping. 5. In plants such as rice 
(Oryza sativa) and Arabidopsis thaliana, the initial 
plant species for which whole-genome sequences 
were available provided representative targets for such 
analysis. 7-1 0 

Tomato has also been used as a model plant in classic- 
al and molecular genetics, 11 due to autogamous 
diploidy (2n = 2x=24) and a relatively compact 
genome (~950 Mb). Recently, the whole-genome se- 
quence of tomato was published. 12 Furthermore, 
Hirakawa et al. 13 inferred the functions of 200 SNPs 
among the transcribed sequences of cultivated 
tomato lines by determining their positions in pre- 
dicted genes on the tomato genome. These results are 
expected to accelerate the understanding of genetic 
mechanisms that confer phenotypic variations among 
tomato cultivars. 

Massive parallel sequencing and genotyping 
methods have contributed to progress in genetics and 
genomics. Next-generation sequencers (NGSs), such 
as HiSeq2500 (lllumina), the GS FLX+ system 
(Roche), 5500x1 SOLiD (Life Technologies), and Ion 
Proton (Life Technologies), have been employed for de 
novo assembly of genomesequencesand re-sequencing 
analyses of genomes of several organisms. 14,1 5 In such 
re-sequencing analysis, sequence reads from the 
whole genome are mapped onto the reference 
genome to identify nucleotide variations, including 
single nucleotide polymorphisms (SNPs) and 
insertions/deletions (indels). 14 A large amount of 
nucleotide sequence data (up to Mb- or Gb-scale), 
redundantly covering the whole-genome sequence, 
can be obtained simultaneously by NGS technologies. 
Thisallowsa huge numberof the nucleotide variations 
to be identified cheaply and within a relatively short 
period of time. The identified SNPs can be used, for 
example, for polymorphic analysis of germplasm col- 
lections, which, in turn, allows genetic analyses such 
as QTL mapping, GWAS, and genomic selection. 16 
Large-scale SNP genotyping is often performed with 
commercially available array-based platforms, such 
as Infinium (lllumina), GoldenGate (lllumina), and 
Axiom Genotyping Solution (Affymetrix). 

Tomato accessions, so-called genetic resources, are 
stocked in several gene banks, including the Tomato 
Genetic Resource Center (TGRC), USA (http://tgrc. 
ucdavis.edu); the National Institute of Agrobiological 
Sciences (NIAS) Genebank, Japan (http://www.gene. 
affrc.go.jp); and the NARO Institute of Vegetable and 
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Tea Science (NIVTS), Japan (http://www.naro.affrc. 
go.jp/vegetea). In the NIAS and NIVTS Genebanks, 
over 1 500 tomato lines have been deposited from 
>50 countries. The morphological traits of each 
line are recorded when the plants are reproduced, 
whereas DNA-based genetic variation has not yet 
been evaluated. By combining massive parallel se- 
quencing and high-throughput genotyping technolo- 
gies, it is now possible to probe genome-wide genetic 
diversity in the large number of tomato accessions 
currently available. In addition, associations 
between genetic and phenotypic variations can be 
identified in the genetic resources by using morpho- 
logical traits recorded in the NIVTS and NIAS 
Genebanks. These studies would provide useful 
knowledge for molecular genetic analysis and breed- 
ing. In this study, we re-sequenced six tomato lines to 
discover novel SNPs that could be used to estimate 
the ratio of the SNPs contributing to the phenotypic 
variation. The identified candidate SNPs were used 
for GWAS to predict the loci responsible for agronomi- 
cally important traits, e.g. fruit size and shape and 
plant architecture. 



2. Materials and Methods 

2.1 . Plant materials and DNA isolation 

Six inbred lines, Ailsa Craig' (AIC), 'Furikoma' (FRK), 
'M82' (M82), Tomato Chuukanbonhon Nou 11' 
(PL1 1), 'Ponderosa' (PON), and 'Regina' (REG), which 
were selected as representative lines from the clusters 
in the phylogenetic tree obtained in our previous 
study, 13 were used for whole-genome re-sequencing 
(Supplementary Table S1). AIC and PON are green- 
house types, and FRK and M82 are processing types 
suited for field cultivation. PL1 1 is a breeding material 
developed at the NIVTS for a short-internode trait, 17 
and REG is a dwarf tomato with cherry-type fruits 
obtained from Sakata Seeds Co., Japan. All materials 
except for REG are available from the NIVTS, Japan. 

The number of genotyped tomato accessions with 
SNPs was 663, of which 641 , 9, 6, 5, 1 , and 1 were 
derived from the NIVTS, Japan; five private companies 
(De Ruiter Seeds Co., The Netherlands; Sakata Seeds 
Co., Japan; Suntory Holdings Ltd., Japan; Takii Seeds 
Co., Japan; and Vilmorin Seeds Co., France); the TGRC 
at the University of California, USA; the National 
BioResource Project (NBRP) at the University of 
Tsukuba, Japan; Cornell University, USA; and the 
Institut National de la Recherche Agronomique 
(INRA), France, respectively (Supplementary Table 
S1). Total genomic DNA was isolated from leaves of a 
single plant from each line using a DNeasy plant mini 
kit (Qiagen). 
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2.2. Whole-genome re-sequencing and identification 
ofSNP candidates 

Total genomic DNA from the six lines, such as AIC, 
FRK, M82, PL1 1 , PON, and REG, was used for whole- 
genome shotgun sequencing according to the standard 
protocol (Life Technologies). The nucleotide sequences 
were determined using the 5500x1 SOLiD sequencer 
(Life Technologies) in the paired-end mode (35 + 75 
bases). The data obtained were mapped onto the refer- 
ence genome sequence of 'Heinz 1 706' (H1 706) 
ver. SL2.40 12 for SNP discovery using the LifeScope 
Genomic Analysis software (Life Technologies) with 
default parameters. When heterozygous SNPs were dis- 
covered in any one of six lines, they were manually 
excluded from the list of SNP candidates. 

The SNP candidates were classified into seven groups 
according to ITAG2.3 predictions of the gene positions 
on the tomato genome 12 as follows: intergenic SNPs, 
SNPs at the donor and acceptor splice sites bordering 
two bases of introns, intron SNPs, SNPs at untranslated 
regions (UTRs), synonymous SNPs, missense SNPs, and 
nonsense SNPs. The functional categories of tomato 
genes predicted in the ITAG2.3 12 were assigned by 
BLASTP 1 8 searches against the eukaryotic orthologous 
groups (KOG) database (http://www.ncbi.nlm.nih. 
gov/COG), with E-value cut-off of 1 E-4. 19 

SNP2CAPS 20 and dCAPS Finder 2.0 21 were used for 
developing cleaved amplified polymorphic sequence 
(CAPS) and derived CAPS (dCAPS) markers, respectively. 
Oligonucleotides for the markers were designed using 
the PRIMER3 software. 22 

2.3. SNP genotyping 

A total of 1 536 SNPs were selected for lllumina 
GoldenGate SNP genotyping of the 663 tomato acces- 
sions. The lllumina GoldenGate assay and subsequent 
SNP calling were performed as described by Shirasawa 
et al. 23 Polymorphic analysis of CAPS and dCAPS 
markers including FAS, SP, and OVATE 23,24 was per- 
formed as described by Shirasawa etal. 23 

2.4. Data analysis 

2.4.1. Clustering of the genetic resources The 
genetic distances and Jaccard's similarity coefficients 
of all combinations of any two accessions were 
calculated from the genotypic data using the GGT2 
software 25 as described by Shirasawa etal. A dendro- 
gram of the genetic resources was established using the 
neighbor-joining method in the MEGA5 software. 

Principal component analysis (PCA) was also per- 
formed to determine the relationship between 
samples using the TASSEL software, 28 in which SNPs 
with minor allele frequencies (MAFs) of <0.05 were 
removed and the number of components was limited 
to three. 



The STRUCTURE softwa re, 29 in which SNPs with MAFs 
of >0.00 were included, was used to assessthe genetic 
relationships of the investigated lines. The degree of ad- 
mixture ineach linewas estimated undertheconditions 
of a 100 000 burn-in period and 100 000 Markov 
Chain Monte Carlo replications. The ideal number of 
clusters (K) was estimated from the output of 20 inde- 
pendent calculations as described by Evanno et al. 30 

2.4.2. Linkage disequilibrium and haplotyping 
analysis Linkage disequilibriums (LDs) of 

all SNP pairs on each chromosome were detected 
using the Haploview software 31 with the following 
parameters: MAF, >0.05; Hardy-Weinberg P-value 
cut-off, 0; and percentage of genotyped lines, >0.75. 
Haplotypes and tag SNPs were predicted based on the 
estimated LD blocks according to the definition of 
Gabriel etal. 32 

2.4.3. Genome-wide association studies Associations 
between genotypes and phenotypes were analysed 
using the mixed linear model (MLM) using the TASSEL 
program 28 with the following parameters: MAF of 
>0.05. In the association analysis, we considered the 
kinship matrix based on the SNP data in the model of 
MLM, while population structure was excluded from 
the model since it could not be detected in the 
tomato accessions with the STRUCTURE analysis. The 
thresholds for the association were set to a -logP 
of >5.06 and 4.36 at a significant level of 1 and 5%, 
respectively, after Bonferroni multiple test correction. 

On NIAS Genebank databases (http://www.gene. 
affrc.go.jp), 71 phenotypic traits are registered for 9- 
479 accessions (the numbers of investigated lines 
differ depending on the traits) as actual measured 
numeric data, qualitative data, and ranked data. They 
were investigated in the field and /or under greenhouse 
conditions over a number of years (1 983-201 1 ) at 
multiple locations (seven sites in Japan and Taiwan). 
The phenotypic data for each accession redundantly 
recorded in multi-years and locations were averaged, 
so that the data could be regarded as continuous nu- 
merical data for the MLM. Of these, 23 traits that 
scored in >100 lines genotyped in this study were 
tested for the GWAS. 



3. Results 

3.1 . Whole-genome shotgun re-sequencing of cultivated 
tomato 

Whole-genome shotgun re-sequencing was per- 
formed for the six inbred tomato lines, such as AIC, 
FRK, M82, PL1 1, PON, and REG. DNA samples tagged 
with line-specific index sequences were subjected to se- 
quencing analysis using the 5500x1 SOLiD sequencer 
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(Life Technologies) in the paired-end mode (75 + 35 
bases) (Table 1).Atotalof 708.3 million read pairs cor- 
responding to 77.9 Gb DNA were obtained (13.7x 
mean depth for each line). In the subsequent in silico 
analysis with the LifeScope Genomic Analysis software 
(Life Technologies), 53.9% of the obtained sequences 
covered 93.4% of the reference genome sequence of 
H1 706 ver. SL2.40 12 at 9.2 x coverage on average for 
each line (Table 1 and Supplementary Table S2). The 
other 46.1% reads were omitted from the mapping 
results due to the low quality of the reads and repetitive 
sequences in the tomato genome. 

3.2. Identification ofSNP candidates and their positions 
on the tomato genome 

Within the mapped sequence reads, a total of 2 01 1 
984 SNP candidates were discovered between H1 706 
(SL2.40ch01 to SL2.40ch12) and the re-sequenced 
lines. Heterozygous and triallelic SNPs were often 
observed among the identified SNP candidates. They 
were considered false positives and were excluded 
from further analysis. As a result, a total of 1 473 798 
SNPs, consisting of 836 676 transition and 637 1 22 
transversion mutations, were identified as confident 
biallelic SNP candidates (Fig. 1 , http://www.kazusa.or. 
jp/tomato), for which accuracy was validated using 
the GoldenGate assay described below. Among these, 
1 70 1 73 SNPs were confirmed by their convertibility 
to CAPS markers (http://www.kazusa.or.jp/tomato), 
which are considered a useful tool for conventional 
DNA polymorphic analysis. 

Different numbers of SNPs with respect to H1 706 
were observed in each line, e.g. 85 534 in PON, 85 
670 in AIC, 1 20 329 in FRK, 245 730 in PL1 1, 710 
904 in M82, and 1 1 02 982 in REG (Supplementary 
Table S3). SNP density with respect to H1 706 was cal- 
culated to be, on average, one SNP per 51 6 bp 
(0.1 9%), and ranged from 1 SNP/689 bp (0.1 5%) in 
REG to 1 SNP/8884 bp (0.01%) in AIC, assuming a 
760 Mb genome size for SL2.40 (Supplementary 



Table S3). The SNPs were unevenly distributed across 
the genomes, i.e. a remarkably large number of SNPs 
were observed on Chromosome 1 1 (Chr1 1) in PL1 1 ; 
Chr04, Chr05,and Chr1 1 in M82;and Chr04,Chr05, 
and Chr1 2 in REG. At the chromosomal segment level, 
the numbers of SNPs ranged from 1 (88-89 Mb pos- 
ition of Chr01 inM82)to1 0 847 (34-35 Mb position 
of Chr05 in REG) using a 1 -Mb window scale (Fig. 1 ). 

The identified SNP candidates were classified into 
seven groups according to their positions in predicted 
genes on the tomato genome sequence (see Section 2 
for details). Of the 1 473 798 SNP candidates, 998, 
279, 1 1 0, and 1 were redundantly mapped onto two, 
three, four, and five gene models, respectively, while 
the other 1 472 410 SNPs were positioned on a single 
gene model. As a result, a total of 1 475 688 SNP sites 
in gene models were targeted for classification. 
Among them, 1 316 332 (89.2%) were in intergenic 
spaces, corresponding to DNA sequences located 
between genes, including UTRs. The other 1 59 356 
SNPs (10.8%) were in genie regions, of which 110 
315 (7.5%) and 49 041 (3.3%) were in introns and 
exons, respectively (Table 2). The number of SNPs po- 
tentially affectinggene function was22 805 (1.5%), in- 
cluding 1 56SNPsatsplice sites in introns, 558 resulting 
in nonsense codons, and 22 091 of missense codons. 

The functions of genes having or not having the SNPs 
were investigated. First, a total of the 34 348 tomato 
genes predicted in the ITAG2.3 12 were classified into 
the three groups: 508 genes having nonsense SNPs 
(Group 1 ); 9436 genes having nonsynoymousSNPs in- 
cluding nonsense, missense SNPs, and SNPs at splice 
junctions (Group 2); and 24 404 genes not classified 
in the Group 2 (Group 3). BLASTP was then used to 
compare the protein sequences with those in the 
KOG database. 19 The 1 5 974 predicted genes were 
classified into KOG categories. The distributions of the 
categories were similar between the Groups 2 and 3 
(Supplementary Fig. S1 ). In the Group 3, on the other 
hand, the proportions of the Categories C (energy pro- 
duction and conversion) and T (signal transduction 



Table 1. Statistics of the re-sequenced genomes in the six tomato lines 



Line name 


Number of read 
pairs (reads) 


Total sequence 
length (bp) 


Re-sequencing 
depth 3 (times) 


%of genome 
coverage 13 


Coverage depth 
(times) 


AIC 


1 04 91 3 343 


1 1 540467 730 


1 2.1 


93.5 


8.3 


FRK 


91 995 91 1 


1 0 1 1 9 550 21 0 


1 0.7 


93.2 


7.2 


M82 


1 07 226 071 


1 1 794 867 810 


1 2.4 


93.0 


8.3 


PL1 1 


1 21 752 304 


1 3 392 753 440 


14.1 


93.7 


9.7 


PON 


94 404 895 


1 0 384 538450 


1 0.9 


93.4 


7.7 


REG 


1 88 026 505 


20 682 91 5 550 


21.8 


93.3 


14.1 


Mean 


1 1 8 053 1 72 


1 2 985 848 865 


1 3.7 


93.4 


9.2 



a Re-sequencing depth = total sequence length/tomato genome size (950 Mb). 

b Mapped percentage on the reference genome sequences (SL2.40, 760 Mb) at > 1 coverage. 
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Ch07 Ch08 Ch09 ClrtO Ch11 CM2 




Figure 1. Density maps for SNPs detected in six tomato lines with respect to the reference tomato genome, SL2. 40. The colours in each block 
represent a continuum of SNP densities: low-to-highSNP densities a re represented by green to red. Left-side elliptic bars indicate the tomato 
chromosomes. Horizontal lines in each chromosome bar show mapped positions of SNPs used fortheGoldenGate assay (black for intergenic 
SNPs, red for SNPs at splice sites and intron SNPs, blue for SNPs at UTRs and synonymous SNPs, and green for missense SNPs and nonsense 
SNPs). Heterochromatic regions are indicated by vertical lines on the right of the chromosomes. Names of genes identified by map-based 
cloning in previous studies are shown on the right of the chromosomes. 



mechanisms) were relatively prominent, while those of 
the Categories M (cell wall/membrane/envelope bio- 
genesis), O (post-translational modification, protein 
turnover, and chaperones), U (intracellular trafficking, 
secretion, and vesicular transport), Y (nuclear struc- 
ture), and Z (cytoskeleton) were conversely low 
(Supplementary Fig. S1 ). 



3.3. SNP genotyping of tomato accessions by the 
lllumina GoldenGate assay 
To select SNPs showing high polymorphism in the 
accessions, the 1 473 798SNPs were filtered bythefol- 
lowing criteria: (i) a LifeScope score of 0.000000; (ii) a 
3:4 SNP segregation ratio in seven plant lines (H1 706 
and the six re-sequenced lines), or PL1 1 -specific SNPs, 
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Table 2. The number of SNPs categorized into seven classes 



Line 


Total 


Intergenic 


Intragenic 












Intron 




Exon 








Splice site 


Intron 


UTR 


CDS 

jy 1 IUI ly 1 1 lUUb 


Non-synonymous 
Missense Nonsense 


AIC 


85 721 


70 707 


26 


9477 


628 


1 769 


3032 


82 


FRK 


1 20 379 


1 04 542 


27 


1 0 082 


645 


1 799 


31 75 


109 


M82 


710986 


672 467 


60 


25 951 


1 529 


4035 


6708 


236 


PL1 1 


245 805 


213 427 


34 


22 083 


1 438 


3646 


5054 


1 23 


PON 


85 595 


70 205 


22 


9847 


661 


1 751 


301 7 


92 


REG 


1 1 04 787 


984 271 


123 


83 074 


7102 


13 157 


1 6 647 


41 3 


6 lines 


1 475 688 


1 31 6 332 


1 56 


1 1 0 1 59 


8982 


17 410 


22 091 


558 



or SNPs specific to two lines including PL1 1 [the later 
twocriteriaweresetbecausethePU 1 lineisconsidered 
to be closely related to many modern Ft hybrid cultivars 
(Fukuoka, personal communication)]; (iii) SNPs 
showing different segregation patterns among the 
seven lines within 3-cM windows covering whole 
genomes of a total length of 1 500 cM; 33 and (iv) an 
lllumina SNP score of >0.6, as determined on the 
lllumina website (https://icom.illumina.com). Using 
these criteria, 1 235 SNPs were selected (Fig. 1 and 
Supplementary Table S4). An additional 301 SNPs 
with MAFs of >0.3 and an lllumina SNP score of 1 .0 
were selected based on data reported in our previous 
studies. 13,23 

A total of 663 tomato accessions (listed in 
Supplementary Table S1) were genotyped with the 
1 536 SNPs using the GoldenGate assay. As a result, 
1 293 SNPs were successfully genotyped in the 663 
accessions, satisfying the criteria of the GenomeStudio 
Data Analysis software (lllumina). Of the 1 293 SNPs, 
1 248 (96.5%) and 1 1 47 (88.7%) showed segregations 
within the 663 accessions within thethreshold of MAFs 
of >0 and >0.05, respectively (Supplementary Table 
S4). The MAF values of the 1 248 SNPs were evenly dis- 
tributed from 0.001 to 0.5, and no significant differ- 
ences in the distribution of the MAF values of the 
seven SNP categories were observed (data not shown). 
The ratios of heterozygous alleles and null alleles were 
high in the seven F, hybrids and three wild species, 
respectively (Supplementary Fig. S2). The higher ratio 
in the three wild species could reflect polymorphisms 
at the probe annealing sites. 23 In contrast, few 
heterozygous or null alleles were observed in the 23 
inbred lines. 

3.4. Clustering analyses of the tomato accessions 

The genetic distances between all combinations of 
any pairs in the 663 tomato accessions were calculated 
based on the genotypes of the 1 248 SNPs. The genetic 



distances among the 663 accessions ranged from 0.00 
to 0.72, with an average of 0.39. No obvious clusters 
were observed in the dendrogram of the genetic dis- 
tances (Supplementary Fig. S3A). To evaluate this 
result, the genetic relationships between the accessions 
were determined by PCA, which showed that there were 
no clusters in the 663 lines (Supplementary Fig. S3B), 
because the individual proportions for PC1, 2, and 3 
were 0.09, 0.06, and 0.05, respectively. Genetic rela- 
tionship analysis using the STRUCTURE software indi- 
cated that there was no population structure in the 
accessions (Supplementary Fig. S3C). This is in contrast 
to the six clusters identified by the delta-K method 
reported by Evan no eta 1. 30 

3.5. Linkage disequilibrium and haplotype 
identifications 
Because nocleargeneticstructurewasobserved inthe 
663 accessions, LD across the tomato genome in these 
lines was investigated (Fig. 2, Supplementary Figs S4 
and S5). A total of 1 23 LD blocks, i.e. chromosome sec- 
tions showing significant LD (based on the definition of 
Gabriel et al. 32 ) between each pair of located SNPs, 
were observed across chromosomes (Supplementary 
Table S4). The 123 LD blocks comprised a total of 458 
SNPs. The average length of the LD blocks was 3.2 Mb, 
ranging from 2 56 bp in Chr10 between sol- 
cap_snp_sl_8260 and SL2.40ch1 0_599891 40W to 
58.3 Mb in Chr01 between SL2.40ch01_7886746R 
and SL2.40ch01_661491 34Y (Supplementary Table 
S4).The lengths of LD blocks containing heterochromat- 
ic regions (average, 1 3.8 Mb) were longer than that in 
euchromatic regions (average, 58 kb) (Supplementary 
Table S4). 

A total of 43 7 haplotypes were identified in the 1 23 
LD blocks. An LD block had an averageof 3.6 haplotypes 
consisting of an average of 3.7 SNPs (data now shown). 
Subsequently, 308 tag SNPs, the minimum SNP subset 
required for distinguishing haplotypes, were selected 
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from the 458 SNPs located in the 123 LD blocks 
(Supplementary Table S4). 

3.6. GWASfor agronomical traits in genetic resources 

GWAS identified a total of nine SNP loci that were sig- 
nificantly associated with eight morphological traits 
recorded in the NIVTS and NIAS Genebanks (Fig. 3, 
Table 3, and Supplementary Fig. S6). The eight traits 
were phenotyped by actual measured numeric data, 
qualitative data, and ranked data, and comprised inflor- 
escence branching (nine ranks), plant habit determinate 
(indeterminante or determinate), plant height (cm), 
number of leaves between inflorescences (number of 
leaves), fruit size (10 ranks), locule number (five 
ranks), green shoulder on immature fruit (10 ranks), 
and the colour of the fruit epidermis (colorless or 
yellow), of which the numbers of scored lines were 




Distance between loci (Mb) 

Figure 2. LD measures; r 2 values against physical distance (Mb) 
between all pairs of SNPs located on the same chromosome. 



476,478,457, 1 1 1 , 479, 474, 452, and 1 37, respect- 
ively (Supplementary Table S1 and Fig. S7). 

Among the eight traits, inflorescence branching 
was associated with two SNP loci, SL2.40ch02_ 
41 751976Y and solcap_snp_sl_39457 (Fig. 3 and 
Table3).TheSNPSL2.40ch02_41 751 9 76Y not belong- 
ing to any LD block was located at a distance of 
4.8 and 1 .6 Mb from the previously identified S 
(Solyc02g077390) and AN (Solyc02g081 670) genes 
involved in compound inflorescence, 34 respectively. 
The other seven morphological traits were signifi- 
cantly associated with seven SNP loci (Table 3 and 
Supplementary Fig. S6). Of the seven SNPs not belonging 
to any LD block, five were located near previously identi- 
fied genes responsible for the targeted traits. These were 
SL2.40ch06_42601 581 W located at 240 kb from SP 
(Solyc06g074350), 35 which is associated with plant 
habit determinate, plant height, and the number 
of leaves between inflorescences; SL1_00sc6004_ 
2094360_solcap_snp_sl_44897, located at 31 kb 
from FAS (Solyd 1 g071 81 0), 36 which is associated 
with fruit size; SL1_00sc6004_2094360_solcap_snp_ 
sl_44897, located at 31 kb from FAS (Solyd 1g 
071 81 0); 36 SL2.40ch02_41 1 72086R, located at 
594 kb from LC (Solyc02g083940 and/or Solyc02g 
083950), 37 and 1.8 Mb from OVATE (Solyc02g 
085500), 38 which are associated with locule number; 
SL2.40ch1 0_1 539862R, located at 753 kb from U 
(Solyd 0g0081 60), 39 which is associated with green 
shoulder on immature fruit; and SL2.40ch01_ 
71 279371 Y, located at 24 kb from Y (SolycOlg 
079620), 40 which is associated with colour of the fruit 
epidermis. To investigate association between the 
genes conferring the traits, polymorphic analysis of SP, 
FAS, LC, OVATE, and U was performed (Supplementary 
Table S5). The replicated GWAS including the five loci 
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Figure 3. SNPs associated with inflorescence branching identified by GWAS. Distribution of SNPs associated with inflorescence branching. SNPs 
that associated significantly (-logPof 4.36 at a significant level of 5%) are indicated by arrows. 



Table 3. Effects of associated SNPs on the traits 



Trait 


Associating SNP 


Chromosome 


Position 


-Log P a 


Additive effect b 


Dominant effect b 


Candidate gene 


Inflorescence branching 


SL2.40ch02_41 751 976Y 


SL2.40ch02 


41 751 976 


4.4* 


0.2 


0.2 


Sand AN 




solcap_snp_sl_39457 


SL2.40ch09 


4 904 1 1 1 


5.4** 


-0.3 


-0.4 




No. of leaves between 


SL2.40ch06_42601 581W 


SL2.40ch06 


42 601 581 


5.4** 


-0.4 


0.4 


SP 


inflorescences 


SP 


SL2.40ch06 


42 362 163 


7.4** 


-0.7 


0.4 




Plant habit determinate 


SL2.40ch06_42601 581W 


SL2.40ch06 


42 601 581 


26.2** 


0.2 


-0.3 


SP 




SP 


SL2.40ch06 


42 362 163 


28.8** 


0.3 


-0.4 




Plant height 


SL2.40ch06_42601 581W 


SL2.40ch06 


42 601 581 


1 1.2** 


-7.7 


21.1 


SP 




SP 


SL2.40ch06 


42 362 163 


11.6** 


-13.5 


25.6 






solcap_snp_sl_1 6654 


SL2.40ch09 


2 1 35 1 01 


7.5** 


-1.0 


31.6 




Fruit size 


SL1_00sc6004_2094360_solcap_snp_sl_44897 


SL2.40ch1 1 


52 280 21 5 


4.6* 


-10.9 


9.1 


FAS 




FAS 


SL2.40ch1 1 


52 252 771 


8.4** 


-33.3 


12.2 




Locule number 


SL2.40ch02_41 1 72086R 


SL2.40ch02 


41 1 72 086 


4.9* 


-0.4 


-0.6 


LC and OVATE 




SL1_00sc6004_2094360_solcap_snp_sl_44897 


SL2.40ch1 1 


52 280 21 5 


7.6** 


-0.5 


-0.2 


FAS 




FAS 


SL2.40ch11 


52 252 771 


10.1** 


-1.2 


0 




Green shoulder 


SL2.40ch01_89266983Y 


SL2.40ch01 


89 266 983 


4.3* 


-0.3 


0.9 






SL2.40ch1 0_1 539862R 


SL2.40ch1 0 


1 539 862 


5.5** 


-0.4 


1.2 


U 




U 


SL2.40ch10 


2 292 260 


20.9** 


-1.5 


1 




Colour of fruit epidermis 


SL2.40ch01_71 279371 Y 


SL2.40ch01 


71 279 371 


g 0** 


0.2 


Not detected 


Y 



Genes associating with the traits in the replicated GWAS are shown by bold, 
a** anc | * | nc |icate the significance level of 1 and 5%, respectively. 
b Effect of 'Heinz 1 706' allele. 



No. 6] 



K. Shirasawa etal. 



601 



showed that SP, U, and FASwere strongly associated with 
these traits (Table 3). 



4. Discussion 

The re-sequencing analysis presented here identified 
a large number of SNP candidates in the cultivated 
tomato, S. lycopersicum, in which DNA polymorphisms 
have been difficult to detect. 4,23 This has been attribu- 
ted to its narrow genetic diversity, which was caused by 
the genetic bottlenecks that occurred during its domes- 
tication, cultivation, and breeding. 41 The intraspecies 
SNP density of 0.1 9% was approximately three times 
lower than that of 0.6% between S. lycopersicum and S. 
pimp'mellifolium} 2 The distribution of the SNPs on the 
reference sequence of H1 706 was not evenly spaced 
over the genome as reported by Asamizu et al. 42 
(Fig. 1 and Supplementary Table S2). In the H1706 
genome, large introgressions are observed in Chr04, 
09,1 1,and 1 2, which has implications forthe introduc- 
tion of disease resistance loci into H1 706 fromS. pimpi- 
nellifolium} 2 The biased SNP density observed in this 
study also suggests the presence of introgressions of 
genome segments from wild relatives in tomato breed- 
ing processes for disease resistance. 1 2 

SNPs are abundant sequence alterations that can 
affect gene function. Among the seven inbred tomato 
lines (including H1 706), 558 nonsense, and 22 091 
missense, and 17 410 synonymous SNPs were found 
in 508 (1.5%), 9285 (26.7%), and 7825 (22.5%) of 
34 727 predicted genes, respectively (Table 2). 
Between S. lycopersicum and S. pimpinellifolium, 3.5, 
36.3, and 37.0% of genes contain nonsense, missense, 
and synonymous mutations. 1 2 The ratio of interspecies 
nonsense mutations to intraspecies nonsense muta- 
tions is 2.3, while the ratios of missense mutations 
and synonymous variations were 1 .4 and 1 .6, respect- 
ively. This result suggests that the alleles of wild relatives 
possessing SNPs that critically disrupt gene function, 
i.e. nonsense SNP, have been negatively selected 
from the gene pool of wild relatives for the purpose of 
breeding. 

The tomato accessions used in this study included 
broad genetic diversities (Supplementary Figs S2 and 
S3). Genome-wide LD analysis based on these acces- 
sions revealed thatthe extension of the LD wasdepend- 
enton the nature of the chromatin (Supplementary Fig. 
S4 and Supplementary Table S4). Similar observations 
have been reported for interspecific F 2 mapping popu- 
lations in tomato, which indicated that chromosome 
recombination in heterochromatin is strongly sup- 
pressed compared with that in euchromatin. ,43 LD 
analysis of whole genomes have been previously per- 
formed not only for tomato, but also for rice, soybean, 
and ArabidopsisJ ~^ 0,1 3 > 44-46 However, LDs specific to 



chromatin have not been investigated. It is expected 
that chromosome recombination over the genome is 
not appreciably different between the accessions and 
biparental mapping populations. 

GWAS revealed SNPs that were associated with agro- 
nomically important traits (Fig. 3, Table 3, and 
Supplementary Fig. S6), and three genes (FAS, SP, and 
U) were found to confer trait variations (Table 3). 
Although such genes have been previously identified 
by a map-based cloning strategy, with interspecific 
populations conferring phenotypic variations between 
cultivated tomato and its wild relatives, the present 
results indicated that these genes are responsible for 
phenotypic variations within cultivated tomato. The 
identified SNPs could be potent selection markers for 
marker-assisted selection in breeding. However, no sig- 
nificant SNP association was detected for most of the 
traits registered in the NIVTS and NIAS Genebanks. 
Two possibilities can be advanced to explain the lack 
of a significant association. First, the density of the 
SNPs was insufficient for GWAS. In this study, while 
1 248 SNPs were employed in GWAS, LD extension in 
the gene-rich euchromatin region (58 kb) was too 
short to be covered by the SNP density employed 
(1 SNP/2 1 3 kb in euchromatin; Supplementary Table 
S4). This analysis suggests that >3228 and >41 SNPs 
in eu- and heterochromatin regions, respectively, 
would be required to obtain high-resolution results 
from GWAS. Additionally, most of the traits were 
scored on 1-5 or 1-10 scales, rather than by 
performing actual measurements. Since the scale 
standards may vary between individual investigators, 
the accuracy is unlikely to be sufficient for GWAS. One 
of the reasons for the success in identifying the SNP 
associations with the eight morphological traits might 
be that the SNPs possessed large effects on phenotypic 
variations. 

In this study, we demonstrated that genetic resource 
accessions can be used for GWAS, i.e. there is no need 
to establish a specific mapping population via labour- 
intensive methods for performing crosses and advan- 
cing generations. In addition, a core collection would 
be more effective for GWAS, as it would avoid the 
labour and cost associated with high-density whole- 
genome genotyping and replicated phenotyping. In 
barley, GWAS was used to detect SNP associations with 
agronomical traits in a worldwide collection. 47 The es- 
tablishment of core collections for tomato, whose con- 
tents could be changed depending on the purpose, 48 
would enable the identification of valuable loci for 
molecular genetics and breeding. 

In conclusion, the usefulness of GWAS was demon- 
strated by analysing a large SNP data set obtained 
from the re-sequencing data. This study represents an 
important step forward in genomics, genetics, and for 
the breeding of cultivated tomato. 
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5. Availability 

Nucleotide sequencedata reported are available inthe 
DDBJ Sequence Read Archive (BioProject PRJDB1 397) 
under the accession numbers DRA001017 (AIC), 
DRA001 01 8 (FRK), DRA001 01 9 (M82), DRA001 020 
(PL11), DRA001021 (PON), and DRA001022 (REG). 
Details of the SNPs and genotypes of the investigated 
genetic resources are available at the Kazusa Tomato 
Genomics DataBase (KaTomicsDB: http://www.kazusa. 
or.jp/tomato). 
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